eXtract: a snippet generation system for XML search

نویسندگان

  • Yu Huang
  • Ziyang Liu
  • Yi Chen
چکیده

Snippets are used by almost every text search engine to complement ranking schemes in order to effectively handle user keyword search. Despite the fact that XML is a standard representation format of web data, research on generating result snippets for XML search remains untouched. In this work, we present eXtract, a system that efficiently generates self-contained result snippets within a given size bound which effectively summarize the query results and differentiate them from one another, according to which users can quickly assess the relevance of the query results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bridging the Gap between Intrinsic and Perceived Relevance in Snippet Generation

Snippet generation plays an important role in a search engine. Good snippets provide users a good indication on the main content of a search result related to the query and on whether one can find relevant information in it. Previous studies on snippet generation focused on selecting sentences that are related to the query and to the document. However, resulting snippet may look highly relevant...

متن کامل

Parsing the Wiki Collection and Snippet Generation A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Sai Subramanyam Chittilla IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

Information Retrieval (IR) is a field which deals with retrieving useful information from large sets of data in response to a query. Much information in this digital age is stored in XML format, which associates a structure with a document. Though IR systems have been used for years to access documents, the field has greatly expanded with the emergence of the world wide web, which emphasizes th...

متن کامل

From Focused Elements to Snippets A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Supraja Nagalla IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

Information Retrieval is a field of computing which traditionally deals with searching a large collection of documents and retrieving documents based on their similarity to the query. INEX [10] provides a platform (e.g., document collection, queries and uniform evaluation metrics) for the development and evaluation of retrieval algorithms for XML documents. The focus of INEX is to reduce the gr...

متن کامل

Compression of Semistructured Documents

EGOTHOR is a search engine that indexes the Web and allows us to search the Web documents. Its hit list contains URL and title of the hits, and also some snippet which tries to shortly show a match. The snippet can be almost always assembled by an algorithm that has a full knowledge of the original document (mostly HTML page). It implies that the search engine is required to store the full text...

متن کامل

Automatic Snippet Generation for Music Reviews

Review aggregator sites (RottenTomatoes.com, Metacritic.com) use snippets to convey the overall gist of the reviews they include in their coverage. These snippets are typically sentences extracted directly from the original review. In this paper we focus on snippet generation in the domain of music reviews—that is, how do you choose a snippet from a music review that best captures the opinion o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2008